Supplementary Material: Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

نویسندگان

  • Timothy A. Mann
  • Shie Mannor
چکیده

t=1 γ (P 1P o2 . . . P o)t (Y |x) for all Y ⊆ X and x ∈ X. We will assume throughout this supplementary material that when we refer to an optimal policy π∗, it is a policy over primitive actions. Because we have assume that O contains the set of primitive actions A, the fixed point of the SMDP Bellman operator T and the MDP Bellman operator T is the optimal value function V ∗. Thus Tπ is equivalent to T π∗ .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to planning with only primitive act...

متن کامل

Iterative Algorithms to Approximate Canonical Gabor Windows: Computational Aspects

In this paper we investigate the computational aspects of some recently proposed iterative methods for approximating the canonical tight and canonical dual window of a Gabor frame (g, a, b). The iterations start with the window g while the iteration steps comprise the window g, the k iterand γk, the frame operators S and Sk corresponding to (g, a, b) and (γk, a, b), respectively, and a number o...

متن کامل

Empirical Results on Convergence and Exploration in Approximate Policy Iteration

In this paper, we empirically investigate the convergence properties of policy iteration applied to the optimal control of systems with continuous state and action spaces. We demonstrate that policy iteration requires lesser iterations than value iteration to converge, but requires more function evaluations to generate cost-to-go approximations in the policy evaluation step. Two different alter...

متن کامل

Forward Search Value Iteration for POMDPs

Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towa...

متن کامل

TD(0) Leads to Better Policies than Approximate Value Iteration

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014